AI News

AI News

Don't miss any moment of global AI innovation

AI Daily

Daily three-minute AI industry trends

AI Timeline

AI industry milestones

Al Hardware

Lists all AI hardware products.

AI Monetization Guide

Latest Cases

AI monetization case sharing

Image Collection

AI image creation monetization cases

Video Collection

AI video creation monetization cases

Audio Collection

AI audio creation monetization cases

Content Collection

AI content writing monetization cases

AI Tutorials

Latest Tutorials

Free sharing of the latest AI tutorials

AI Product Rankings

AI Product Ranking

Shows total visits ranking of AI websites

AI Traffic Growth Ranking

Track fastest growing AI websites by traffic

AI Traffic Decline Ranking

Focus on AI websites with significant traffic drops

AI Weekly Ranking

Shows weekly visits ranking of AI websites

Popular Country Rankings

United States

AI websites most popular with US users

China

AI websites most popular with Chinese users

India

AI websites most popular with Indian users

Brazil

AI websites most popular with Brazilian users

Popular Category Rankings

Image Generation

Total visits ranking of AI image generation websites

Personal Assistant

Total visits ranking of AI personal assistant websites

Character Generation

Total visits ranking of AI character generation websites

Video Generation

Total visits ranking of AI video generation websites

Popular Open Source Data Rankings

AI Project Ranking

GitHub popular AI projects by total stars

AI Project Growth Ranking

GitHub popular AI projects by growth rate

AI Developer Ranking

GitHub popular AI developer ranking

AI Organization Ranking

GitHub popular AI organization ranking

Popular Open Source Categories

Deepseek

GitHub popular deepseek open source projects

TTS

GitHub popular TTS open source projects

LLM

GitHub popular LLM open source projects

ChatGPT

GitHub popular ChatGPT open source projects

AI Open Source Project Library

Overview

Overview of GitHub popular AI open source projects

Product Library Tool Navigation

Search AI Products and News

Explore worldwide AI information, discover new AI opportunities

✓AI News
AI Tools

Type :

✓AI News
AI Tools

2025-04-14 08:37:09.AIbase

OpenGVLab Open-Sources InternVL3 Series of Multimodal Large Language Models

OpenGVLab has open-sourced the InternVL3 series of models, marking a new milestone in the field of Multimodal Large Language Models (MLLMs). The InternVL3 series comprises seven models ranging from 1B to 78B parameters, capable of handling text, images, and videos simultaneously, demonstrating superior overall performance.

OpenGVLab Open-Sources InternVL3 Series of Multimodal Large Language Models

2025-03-24 15:58:55.AIbase

Microsoft Unveils GeoMap-Bench to Advance Intelligent Understanding of Geological Maps

In geoscience, geological maps are crucial tools for understanding the Earth's surface and subsurface structures. However, interpreting these complex diagrams requires specialized knowledge and extensive experience. To enhance intelligence in this field, Microsoft Research Asia recently introduced GeoMap-Bench, a new benchmark dataset for evaluating the performance of multimodal large language models (MLLMs) in understanding geological maps. The launch of GeoMap-Bench marks a significant step forward in AI applications for geological map interpretation. Microsoft researchers, in collaboration with...

Microsoft Unveils GeoMap-Bench to Advance Intelligent Understanding of Geological Maps

2025-01-13 09:21:47.AIbase

Integrated AI Framework Sa2VA: Achieving Deep Understanding of Images and Videos

Driven by multimodal large language models (MLLMs), significant advancements have been made in tasks related to images and videos, including visual question answering, narrative generation, and interactive editing. However, achieving fine-grained understanding of video content still poses major challenges. These challenges involve tasks such as pixel-level segmentation, tracking with language descriptions, and visual question answering based on specific video prompts. Although current state-of-the-art video perception models excel in segmentation and tracking tasks, they still fall short in open language understanding and conversational capabilities.

Integrated AI Framework Sa2VA: Achieving Deep Understanding of Images and Videos

2024-10-28 14:42:03.AIbase

Meta Open Sources Long Video LLM Project LongVU: Filters Duplicate Frames for Efficient and Accurate Understanding of Long Video Content

Recently, the Meta AI team introduced LongVU, a novel spatio-temporal adaptive compression mechanism aimed at enhancing the language understanding capabilities of long videos. Traditional multimodal large language models (MLLMs) face limitations in context length when processing long videos, and LongVU was created to address this challenge. LongVU operates primarily by filtering duplicate frames and employing inter-frame token compression techniques to efficiently utilize context length, allowing it to reduce video data while preserving visual details.

Meta Open Sources Long Video LLM Project LongVU: Filters Duplicate Frames for Efficient and Accurate Understanding of Long Video Content

2024-10-08 11:18:05.AIbase

Apple Introduces MM1.5: A Revolution in Multimodal AI Models Redefining Intelligent Understanding?

Recently, Apple's AI research team launched their next-generation family of Multimodal Large Language Models (MLLMs) - MM1.5. This series of models can integrate various data types such as text and images, showcasing new capabilities of AI in understanding complex tasks. Tasks like visual question answering, image generation, and multimodal data interpretation can be better addressed with the help of these models. A major challenge for multimodal models is how to achieve effective interaction between different data types. Previous models often struggled in this aspect.

Apple Introduces MM1.5: A Revolution in Multimodal AI Models Redefining Intelligent Understanding?

2024-09-14 15:42:34.AIbase

Apple Aims to Leverage the UI-JEPA Model to Understand User Intent on Devices

As AI technology continues to advance, understanding user interfaces (UI) has become a key challenge in creating intuitive and useful AI applications. Recently, researchers at Apple introduced UI-JEPA in a new paper, an architecture designed for lightweight, device-side UI understanding that maintains high performance while significantly reducing the computational requirements for UI understanding. The challenge of UI understanding lies in processing cross-modal features, including images and natural language, to capture temporal relationships within UI sequences. Despite the complexity, multimodal large language models...

Apple Aims to Leverage the UI-JEPA Model to Understand User Intent on Devices

2024-07-01 09:14:16.AIbase

GPTPdf: Analyzing PDF Files with Multimodal Large Language Models Similar to GPT-4o

Recently, an open-source project called gptpdf has gained 1.1k stars on GitHub. It uses a VLLM model similar to GPT-4o to parse PDF files and convert them into Markdown format.gptpdf Product Entry: https://top.aibase.com/tool/gptpdf It is learned that the code for this project consists of only 293 lines, yet it can nearly perfectly parse and include content such as formatting, mathematical formulas, tables, images, charts, and more. The implementation steps of gptpdf are: 1) Use the PyMuPDF libr

GPTPdf: Analyzing PDF Files with Multimodal Large Language Models Similar to GPT-4o